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I.  Report  Organization 


i 


Corresponding  to  the  two  Air  Force  agencies  that  supported 
this  research,  the  report  is  presented  in  two  parts. 

(1)  PART  I:  Numerical  Studies  in  Computer-Aided  Design. 
Sponsored  by  AFOSR,  this  parent  study  involved  general 
research  in  the  mathematical  modeling  of  vector  pro¬ 
cessors  and  the  development  of  vectorized  sparse  matrix 
procedures. 

(2)  PART  II:  Studies  in  High  Speed  Computation.  Funded 
by  AFFDL  in  the  fourth  and  fifth  years  of  the  parent 
study,  this  research  involved  a  more  detailed  study  of 
the  CRAY-1  processor  and  its  application  to  simulation 
of  aerodynamic  fluid  flow. 

II.  Summary  of  Research 

This  grant  was  originally  concerned  solely  with  the  develop¬ 
ment  of  algorithms  and  the  evaluation  of  algorithmic  complexity 
for  the  direct  (contrast  iterative)  solution  of  large  sets  of 
sparse  simultaneous  equations  on  vector  processors.  This  analyti¬ 
cal  study  was  preceded  by  benchmarks  of  a  number  of  commercial 
processors.  The  speedups  observed  in  these  benchmark  studies, 
especially  for  the  CRAY-1,  suggested  the  subsequent  CRAY-1  bench¬ 
mark  study  on  2-D  and  3-D  aerodynamic  fluid  flow.  These  two  studies 
found  common  ground  in  development  of  a  CRAY-1  logical/timing 
simulator,  which  greatly  facilitated  both  algorithm  development 
and  the  coding  of  benchmarks. 

The  principal  accomplishments  of  this  grant  are  summarized  in 


the  following. 


PART  I 


A.  Benchmarks 

In  1976,  a  report  [27]  was  prepared 
mathematical  modeling  of  a  number  of  early  commercial  vector  pro¬ 
cessors  (Texas  Instruments  ASC,  Control  Data  STAR  100,  and  Cray 
Research  CRAY-1) .  This  was  the  earliest  public  benchmark  of  these 
processors  and,  even  though  it  involved  a  simple  linear  equation 
solver,  several  hundred  copies  of  the  benchmark  report  were  requested. 
The  results  are  summarized  in  Figure  1. 

B.  Software  for  sparse  solution 

In  1977,  a  Fortran-coded  vectorized  version  of  a  well-known 
scalar  sparse  equation  solution  algorithm  [29]  was  developed.  This 
report  anticipated  by  several  years  the  exploitation  of  vector  pro¬ 
cessing  for  the  solution  of  sparse  problems.  Implemented  in  Fortran, 
however,  it  could  not  utilize  the  particular  data  flow  characteristics 
of  a  memory  hierarchial  machine  such  as  the  CRAY-1.  Indeed,  its  maxi¬ 
mum  processing  speed  is  limited  to  approximately  1/4  the  maximum  speed 
of  the  CRAY-1,  in  spite  of  its  vectorized  formulation  [1] [4] . 

In  1979,  an  assembly-coded  block-oriented  sparse  solver  was 
developed  [4]  [5]  to  exploit  the  data  flow  of  the  CRAY-1.  It  was 
found  that,  since  the  CRAY-1  required  blocking  of  even  dense  matrices 
due  to  its  memory  hierarchy,  the  same  general  sparse  solver  could  be 
used  to  solve  -  with  little  overhead  for  generality  but  with  high 
efficiency  from  assembly  coding  -  banded,  blocked  tridiagonal,  and 
full  matrices.  This  appears  to  represent  a  new  application  of  sparse 
solvers,  i.e.,  the  replacement  of  a  number  of  assembly  codes  written 
for  specific  sparsity  structures. 


C .  Complexity  of  vectorized  sparse  solution 

In  1975-6  [10] [28] ,  the  concept  of  average  vector  lengh  was 
proposed  as  a  useful  representation  of  the  vectorizability  of  the 
solution  of  finite  element  grids.  Complexity  formula  and  timing 
estimates  were  given  for  solution  of  grids.  Example  results  are 
given  in  Figure  2. 

D.  Equation  ordering  for  vector  solution. 

In  1979  [1] [2] ,  it  was  observed  that,  in  the  solution  of  sparse 
equations,  vectors  resulted  either  from  exploiting  local  density 
of  a  sparse  matrix  or  a  global  pattern  associated  with  the  problem 
structure  (or,  equivalently,  the  associated  graph  of  the  matrix) .  It 
was  shown  that  certain  symmetry-exploiting  operations  on  the  graph, 
such  as  folding  and  rotation,  yielded  an  equation  ordering  which  re¬ 
sulted  in  vector  operations  in  the  solution  of  the  equations.  For 
example,  the  reordering  of  the  matrix  of  Figure  3  yields  the  "striped" 
structure  of  Figure  4,  which  is  shown  in  [2]  to  be  more  amenable  to 
vectorized  solution. 


PARTS  I  and  II 

A.  CRAY-1  Simulator 

In  1978,  a  CRAY-1  logical  and  clock-level  timing  simulator  and 
a  cross  assembler  were  implemented  for  the  Amdahl  470.  This  program 
was  subsequently  converted  to  the  CRAY-1  by  Los  Alamos  Scientific 
Laboratory  and  Lawrence  Livermore  Laboratory  to  study  critical  high 
performance  algorithms  which  even  the  CRAY-1,  without  an  interrupt 
capability,  cannot  itself  monitor. 


Equation  Solving  Codes 


A  number  of  high  performance  codes  were  developed  with  the 
aid  of  the  CRAY-1  simulator.  Among  other  notable  results,  it  was 
shown  that  vector  accumulation  loops,  encountered  in  nearly  all 
linear  algebra  codes,  could  be  more  efficiently  implemented  by 
avoiding  functional  unit  chaining,  supposedly  a  feature  of  the 
CRAY-1  designed  to  produce  concurrent  operation  of  functional  units 
These  codes  have  been  documented  in  [20],  and  have  been  sent  to 
Cray  Research,  to  Bell  Telephone  Laboratories,  and  to  Los  Alamos 
Scientific  Laboratory,  on  their  request. 

PART  II 

A.  Aerodynamic  Fluid  Flow 

In  1978  and  1979,  kernels  of  an  explicit  Navier  Stokes  code 
developed  at  AFFDL  were  coded  in  both  CRAY-1  assembly  language  and 
CRAY-1  Fortran.  Results  are  reported  in  [3]  and  [6],  joint  papers 
with  AFFDL.  A  summary  of  the  speedups  obtained  viz  a  viz  the  CDC 
6600  and  7600  are  given  in  Table  1  below. 


COMPUTER 

CODE 

RATIO 

CYBER  74 

Scalar 

1.0 

CDC  7600 

Scalar 

5.2 

CRAY-1 

Scalar 

15.7 

CRAY-1 

Vector 

127.7 

CRAY-1 

Assembly 

144.2 

Table  1.  Relative  execution  rates  of 

Computers  in  explicit  solution. 


III.  Coupling  Activities 


A.  Seminars  on  vector  processing 

1.  Two  seminars  at  Los  Alamos  Scientific  Laboratory 

2.  One  seminar  at  the  University  of  Minnesota. 

3.  One  seminar  at  AFFDL. 


B.  Visits 

1.  One  visit  to  AFWL. 

2.  Two  visits  to  AFFDL,  prior  to  institution  of  funding. 


C.  Industrial  Consulting 

1.  With  General  Electric  and  EPRI,  on  the  vector  analysis 
of  electric  power  system  grids,  resulting  in  a  report 


K 
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[30]  . 

2.  With  Mobil  Research  and  Development  (Dallas)  on  the 
study  of  vectorization  of  3-D  diffusion  codes  associated 
with  oil  reservoir  ^rilling  and  management. 

3.  With  Bell  Telephone  Laboratories,  to  study  the  solution 
of  sparse  equations  representing  communication  systems. 

Other 

1.  A  one-week  short  course  at  the  University  of  Michigan  on 
High  Speed  Computation  was  taught  in  1977,  1978,  and  1979. 
Among  41  attendees  in  1979  were  representatives  of  Air 
Force  Headquarters  (Pentagon) ,  AFWL,  RADC ,  and  AFFDL,  as 
well  as  NRL  and  Picatinney  Arsenal. 

2.  Visiting  Research  Scientist  at  the  Los  Alamos  Scientific 

Laboratory  (1979-  )  to  evaluate  particle  physics  codes. 


3.  Visiting  scientist  at  AFFDL  (1975-  )  to  assist  in 

development  of  vectorized  explicit/implicit  fluids 
codes. 
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Figure  2.  Average  vector  lengths  for  direct  solution  of  finite 
element  grids,  with  different  definitions  of  a  vector 

primitive. 


Figure  3.  Connectivity  graph  of  finite  difference 
difference  grid;  notation  of 
quadrants  (a)  into  single  quad¬ 
rant  representation  (b) ;  rotation 
with  cut  and  creation  of  nodes 
( (b)  -  (c) )  . 
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